Building an R Package

From Scripts to Standardized Open-Source Tools

David Munoz Tord - Senior R Developer

Cytel FSP - Johnson & Johnson Innovative Medicine

August 22, 2025

Life in Pharma: Tables, Tables Everywhere

Case Study: junco Clinical Trial Analysis Package

Example transformation of repetitive clinical trial analysis scripts:

  1. Identify common functions across multiple studies
  2. Standardize input/output formats
  3. Create consistent documentation
  4. Implement validation tests
  5. Open Sourcing

Challenges in Pharmaceutical Programming

  • The pharmaceutical industry faces unique challenges in clinical and statistical programming:

  • Implementing patterns used across many table shells (developed by dedicated teams)

  • Company-specific statistical methods that need to be standardized

  • Complex table structures that must be consistent across studies

  • Need for a core framework that ensures all company shells can be created consistently

  • → that where I step in !

Shells Are Table Specific, Table Creation Is Not

The Junco Package Approach

Our solution was to develop our own business logic framework for J&J table creation

Getting to Production

The Junco package provides key features needed for production-ready clinical tables:

  • True-type font support: Word wrapping, pagination, and RTF export
  • Higher order column counts: Utilities for spanning column headers
  • Guaranteed pathability in row space
  • Nearest-value (SAS-like) rounding support: Maintaining consistency with existing processes
  • Statistical calculations: In accord with business logic
  • Robust approach for risk diff columns

Limitations of Script-Based Solutions

  • Poor portability: Hardcoded paths and environment-specific settings
  • Code duplication: Functions often copied across multiple scripts
  • Reload overhead: Need to source entire scripts even when only one function is needed
  • No tools for documentation, testing and difficult version tracking

Impact on Productivity

When standardized code is distributed as scripts, users spend significant time troubleshooting environment issues, manually tracking versions, and repeatedly reloading large files for small changes.

Creating the package


From Scripts to Standardized Tools


Industry Benefits

  • Maintain consistency across studies and therapeutic areas
  • Reduce time spent on repetitive coding tasks
  • Improve compliance with regulatory standards
  • Facilitate knowledge transfer between team members

Code Design and API for Users

Once we had the core functionality working, we needed to focus on:

  • User-friendly API: Creating intuitive functions that match users’ mental models
  • Consistent interfaces: Ensuring functions work together seamlessly
  • Clear documentation: Making the package accessible to non-developers

Note

API design is one of the most critical and challenging aspects of writing good code

Documentation

Using roxygen2 and pkgdown

Documentation is critical for package adoption and proper use:

  1. roxygen2 for function documentation:
#' Calculate risk difference with confidence intervals
#'
#' @param group1 Vector of outcomes for first group
#' @param group2 Vector of outcomes for second group
#' @param conf.level Confidence level (default: 0.95)
#' @return A list with risk difference and Cis
#' @export
risk_diff <- function(group1, group2, conf.level = 0.95) {
 # Function implementation
}
  1. pkgdown for website generation:
usethis::use_pkgdown()
pkgdown::build_site()

Ensuring Package Quality

Unit Testing and Validation

Quality assurance is essential for pharmaceutical applications:

  • Unit tests: Verify each function works as expected
  • Code coverage: Metric that measures the percentage of your application’s code executed by your tests: junco’s coverage
  • Integration tests: Ensure components work together correctly
# Basic unit testing example
add_values <- function(x, y) {
  return(x + y)
}

library(testthat)

test_that("1: addition works", {
  expect_equal(add_values(2, 2), 4)
})
Test passed 😀
test_that("2: we do expected an error", {
   expect_error(add_values("a", 1))
})
Test passed 🥳

Continuous Integration and Deployment

Automating Quality Checks

CI/CD reduces manual work while improving quality:

  • Automated testing: Run tests on every code change
  • Code coverage: Ensure comprehensive test coverage
  • Style checking: Maintain consistent coding standards
  • Documentation building: Keep documentation in sync with code

→ Our GitHub actions

Note

Also available internally with Jenkins (but much more effort and knowledge needed)

Advanced Topics

Version Control and Collaboration

Best Practices for Team Development

Effective collaboration requires good processes:

  • Semantic versioning: Major.Minor.Patch format (junco v0.1.1)
  • Git branching strategy: Feature branches and pull requests
  • Code review: Ensure quality and knowledge sharing
  • Issue tracking: Document bugs and feature requests → issue #45

Internal vs. External Packages

Choosing the Right Distribution Method

Different distribution methods serve different needs:

  • Internal packages: Company-specific methods, proprietary algorithms
  • GitHub packages: Community collaboration, rapid development
  • CRAN packages: General-purpose tools, widely applicable methods → integrable with SPACE and package validation processes
  • Pharmaceutical Considerations

    Internal packages often contain proprietary methods or company-specific workflows that shouldn’t be publicly shared, while more general statistical methods may benefit from community review through CRAN or GitHub distribution.

Open Sourcing in Pharma

Why Joining Efforts is More Efficient

  • Standardization: Common tools lead to more consistent and comparable analyses
  • Reduced duplication: Companies avoid solving the same problems repeatedly
  • Shared maintenance burden: Multiple companies contribute to maintaining core infrastructure
  • Broader testing: Diverse use cases identify edge cases and bugs more effectively

Industry Transformation

The pharmaceutical industry is increasingly recognizing that pre-competitive collaboration on analytical tools benefits everyone. Projects like the R Validation Hub and Pharmaverse demonstrate how shared open source efforts can accelerate innovation while reducing costs.

Demo: Creating Complex Tables with Junco

Table Creation Script

library(junco)
library(dplyr)
library(pharmaverseadamjnj)

ADEG <- pharmaverseadamjnj::adeg |>
  select(STUDYID, USUBJID, TRT01A, PARAM, AVISIT, AVAL, CHG) |>
  filter(PARAM == "ECG Mean Heart Rate (beats/min)") |>

  mutate(colspan_trt = factor(
    if_else(TRT01A == "Placebo", " ", "Active Study Agent"),
    levels = c("Active Study Agent", " ")
  )) |>

  mutate(rrisk_header = "Risk Difference (%) (95% CI)") |>
  mutate(rrisk_label = paste(TRT01A, paste("vs", "Placebo")))

colspan_trt_map <- create_colspan_map(ADEG,
  non_active_grp = "Placebo",
  non_active_grp_span_lbl = " ",
  active_grp_span_lbl = "Active Study Agent",
  colspan_var = "colspan_trt",
  trt_var = "TRT01A"
)
ref_path <- c("colspan_trt", " ", "TRT01A", "Placebo")

lyt <- basic_table() |>
  split_cols_by(
    "colspan_trt",
    split_fun = trim_levels_to_map(map = colspan_trt_map)
  ) |>
  split_cols_by("TRT01A") |>
  split_rows_by(
    "PARAM",
    label_pos = "topleft",
    split_label = "Blood Pressure",
    section_div = " ",
    split_fun = drop_split_levels
  ) |>
  split_rows_by(
    "AVISIT",
    label_pos = "topleft",
    split_label = "Study Visit",
    split_fun = drop_split_levels,
    child_labels = "hidden"
  ) |>
  split_cols_by_multivar(
    c("AVAL", "AVAL", "CHG"),
    varlabels = c("n/N (%)", "Mean (CI)", "CFB (CI)")
  ) |>
  split_cols_by("rrisk_header", nested = FALSE) |>
  split_cols_by(
    "TRT01A",
    split_fun = remove_split_levels("Placebo"),
    labels_var = "rrisk_label"
  ) |>
  split_cols_by_multivar(c("CHG"), varlabels = c(" ")) |>
  analyze("STUDYID",
    afun = a_summarize_aval_chg_diff_j,
    extra_args = list(
      format_na_str = "-", d = 0,
      ref_path = ref_path, variables = list(arm = "TRT01A", covariates = NULL)
    )
  )

result <- build_table(lyt, ADEG)

Rendered Table Output

rtables.officer::tt_to_flextable(result)

Active Study Agent

Risk Difference (%) (95% CI)

Blood Pressure

Apalutamide

Apalutamide Subgroup

Placebo

Apalutamide vs Placebo

Apalutamide Subgroup vs Placebo

Study Visit

n/N (%)

Mean (CI)

CFB (CI)

n/N (%)

Mean (CI)

CFB (CI)

n/N (%)

Mean (CI)

CFB (CI)

ECG Mean Heart Rate (beats/min)

Baseline

72/72 (100.0%)

319.5 (281.0, 358.0)

96/96 (100.0%)

313.6 (276.9, 350.3)

86/86 (100.0%)

258.2 (223.5, 292.9)

Month 1

72/72 (100.0%)

252.3 (208.0, 296.6)

-67.2 (-119.9, -14.4)

94/94 (100.0%)

286.0 (247.1, 324.9)

-30.5 (-82.8, 21.7)

84/84 (100.0%)

291.9 (253.4, 330.3)

34.0 (-16.0, 84.0)

-101.2 (-173.3, -29.1)

-64.5 (-136.3, 7.3)

Month 3

72/72 (100.0%)

306.7 (262.3, 351.1)

-12.8 (-75.5, 50.0)

73/73 (100.0%)

311.3 (268.5, 354.2)

2.4 (-58.8, 63.5)

82/82 (100.0%)

283.8 (245.9, 321.6)

24.6 (-25.1, 74.4)

-37.4 (-116.9, 42.1)

-22.2 (-100.4, 56.0)

Month 6

68/68 (100.0%)

273.5 (234.4, 312.5)

-42.0 (-103.1, 19.1)

65/65 (100.0%)

281.5 (239.0, 324.0)

-34.9 (-95.9, 26.2)

76/76 (100.0%)

303.8 (261.4, 346.2)

40.7 (-19.8, 101.2)

-82.7 (-167.9, 2.6)

-75.6 (-160.7, 9.6)

Month 9

56/56 (100.0%)

277.6 (233.0, 322.1)

-33.1 (-106.9, 40.7)

60/60 (100.0%)

312.4 (263.6, 361.2)

2.9 (-61.9, 67.6)

73/73 (100.0%)

310.9 (269.1, 352.6)

50.1 (-12.6, 112.8)

-83.2 (-179.2, 12.7)

-47.3 (-136.5, 42.0)

Month 12

50/50 (100.0%)

313.8 (265.4, 362.2)

-4.9 (-62.8, 52.9)

52/52 (100.0%)

324.7 (273.6, 375.8)

21.9 (-46.0, 89.8)

69/69 (100.0%)

319.8 (277.6, 362.0)

59.5 (7.3, 111.7)

-64.5 (-141.6, 12.6)

-37.6 (-122.4, 47.2)

Month 15

37/37 (100.0%)

300.9 (245.4, 356.4)

-19.9 (-96.0, 56.1)

42/42 (100.0%)

279.9 (228.2, 331.5)

-13.2 (-92.8, 66.5)

68/68 (100.0%)

291.1 (251.0, 331.2)

29.4 (-32.9, 91.7)

-49.3 (-146.4, 47.7)

-42.6 (-142.5, 57.4)

Month 18

32/32 (100.0%)

313.3 (241.1, 385.5)

5.4 (-90.7, 101.5)

31/31 (100.0%)

272.7 (202.1, 343.3)

-43.9 (-146.8, 59.0)

66/66 (100.0%)

288.2 (250.1, 326.3)

24.1 (-28.6, 76.9)

-18.7 (-127.2, 89.7)

-68.0 (-182.4, 46.4)

Month 24

30/30 (100.0%)

296.0 (229.3, 362.7)

-15.3 (-118.6, 88.0)

27/27 (100.0%)

373.7 (304.7, 442.6)

61.4 (-35.7, 158.6)

59/59 (100.0%)

298.7 (254.3, 343.1)

33.6 (-25.8, 93.0)

-48.9 (-166.6, 68.8)

27.8 (-84.4, 140.1)

Summary

Key Takeaways


  • R packages provide a structured framework for organizing code
  • They improve reproducibility and reduce errors
  • Documentation and testing are essential components
  • Packages can streamline workflows in pharmaceutical research
  • The investment in package development pays off through reuse and reliability

Resources


Q&A



Thank You for Your Attention!

Have questions about R package development or the Junco package?

Feel free to reach out using the contact information provided.

Contact Information